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Abstract 

Mammalian genomes are replete with millions of polymorphic sites, among which those genetic variants that are colocated on the 
same chromosome and exist close to one another form blocks of closely linked mutations known as haplotypes. The linkage within 
haplotypes is constantly disrupted due to meiotic recombination events. Whole ensembles of such numerous haplotypes are sub- 
jected to evolutionary pressure, where mutations influence each other and should be considered as a whole entity — a gigantic matrix, 
unique for each individual genome. This idea was implemented into a computational approach, named Genome Evolution by Matrix 
Algorithms (GEMA) to model genomic changes taking into account all mutations in a population. GEMAhas been tested for modeling 
of entire human chromosomes. The program can precisely mimic real biological processes that have influence on genome evolution 
such as: 1) Authentic arrangements of genes and functional genomic elements, 2) frequencies of various types of mutations in 
different nucleotide contexts, and 3) nonrandom distribution of meiotic recombination events along chromosomes. Computer 
modeling with GEMA has demonstrated that the number of meiotic recombination events per gamete is among the most crucial 
factors influencing population fitness. In humans, these recombinations create a gamete genome consisting on an average of 48 
pieces of corresponding parental chromosomes. Such highly mosaic gamete structure allows preserving fitness of population under 
the intense influx of novel mutations (40 per individual) even when the number of mutations with deleterious effects is up to ten times 
more abundant than those with beneficial effects. 
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Introduction 

Humans have modest intraspecies genetic variations among 
mammals (Kaessmann et al. 2001 ; Zhang and Plastow 201 1 ). 
Even so, the number of genetic variations between two per- 
sons from the same ethnic group (e.g., Japanese, Finnish) is in 
the range of 3.4-5.2 millions as demonstrated by the "1000 
Genomes" International Sequencing Project (Abecasis et al. 
2012). This gigantic pool of nucleotide variations is constantly 
updating by 40-100 novel mutations arriving in each person 
(Kondrashov and Shabalina 2002; Conrad et al. 201 1; Li and 
Durbin 2011). Closely located mutations on the same DNA 
molecule are linked together forming haplotypes that are in- 
herited as whole units and span over a considerable portion of 



a gene or several neighboring genes (Consortium 2005). 
An intense intermixture of millions of mutations occurs in 
every individual due to frequent meiotic recombinations 
during gametogenesis. On an average, a haploid genome of 
a human gamete comprised 48 pieces of parental chromo- 
somes (see Section 2 of the supplementary file S1, 
Supplementary Material online [GEMA_lnstructions.pdf]). 
DNA recombination process causes gradual change of haplo- 
type structures from generation to generation. Several math- 
ematical theories have attempted to describe the intricate 
dynamics of genetic variations in populations. These models 
often conflict with each other and there is no universally 
acknowledged perception of genomic nucleotide dynamics 
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(Wagner 2008; Nei et al. 2010). Population genetics mathe- 
matical theories often consider mutations individually despite 
that natural selection, a major player in evolution, occurs 
simultaneously on entire ensembles of mutations in an organ- 
ism. It should be acknowledged that background selection 
and genetic hitchhiking deal with groups of neighboring 
mutations from the same locus (Hill and Robertson 1966; 
Stephan 2010), whereas Fisher, Wright, and later researchers 
considered interactions of mutations in a few loci (Fisher 1 930; 
Wright 1965; Bodmer and Felsenstein 1967; Gavrilets and 
Hastings 1 994). We suppose that mutations should be treated 
as a whole entity — a gigantic matrix of all genetic variations, 
unique for every individual genome. With this aim we devel- 
oped a computer program to process such matrices, named 
Genome Evolution by Matrix Algorithms (GEMA). Application 
of GEMA has already revealed new insights in population 
genetics presented in this article. This public program can be 
used for a broad range of investigations in the field of 
genomics. 

A key question in population genetics that has been inves- 
tigated for decades is: What is the probability of a certain 
mutation with a selection coefficient s to be fixed in a popu- 
lation? For a trivial case of a neutral mutation (when s = 0), 
there exists an undisputable solution to the problem inferred 
by the neutral theory of evolution. This theory predicts that the 
ultimate fixation probability of a novel neutral mutation 
(which is initially present as a single copy) is equal to 1/(2A/ e ), 
where A/ e is the effective size of the population (Kimura 1 983). 
Lately, Tomoko Ohta demonstrated that nearly neutral muta- 
tions (2Ns « 1) behave as if they are neutral (Ohta and 
Gillespie 1996). However, the general solution of this problem 
(when 0 and 2Ns product is not close to zero) is very con- 
voluted and depends on a number of parameters/variables 
characterizing organisms, populations, and environment. 
Moreover, these parameters have significant synergistic/an- 
tagonistic effects, making it impossible to infer fixation prob- 
ability even with the most advanced mathematical 
approaches. As we discuss further, even for the trivial case 
of neutral mutations (s = 0), the probability of fixation of a 
novel mutation, for a particular combination of parameters 
characterizing organism and population, might significantly 
deviate from the formula 1/(2A/ e ) of Kimura (1962), obtained 
using diffusion theory of stochastic processes. Mathematical 
theories in population genetics deal only with oversimplified 
models considering no more than two or three parameters at 
a time and predominantly examining a single or a few loci. 
Thus, the profound query by Sanford (2008) in "Genetic en- 
tropy" — "What will happen with mankind in the nearest 
future when each person has a hundred of novel muta- 
tions?" — remains unanswered. Instead of mathematical 
modeling, this problem can be approached more fruitfully 
from a different dimension, taking advantage of the enor- 
mous power of contemporary supercomputers. Computer 
modeling of genetic processes may be considered as an 



advanced branch of cellular automata, named by Wolfram 
(2002) as "A new kind of science." On numerous examples 
Wolfram demonstrated that any system of interacting ele- 
ments creates patterns within their arrangements, which are 
hard to predict mathematically yet trivial to reproduce and 
study computationally. Here, we implemented such compu- 
tational approach and present our program, named or GEMA. 
This program models the evolution of genomic sequences in a 
population under the influx of numerous mutations at mul- 
tiple loci and can take into account dozens of parameters/ 
variables simultaneously. It belongs to a forward-time simu- 
lation category (Carvajal-Rodriguez 2010) and implies a 
Wright-Fisher population modeling where generations do 
not overlap (Hartl and Clark 2006). GEMA has many features 
similar to previously published programs (GENOMPOP 
[Carvajal-Rodriguez 2008], SFS_CODE [Hernandez 2008], 
FREGENE [Chadeau-Hyam et al. 2008], Mendel's 
Accountant [Sanford et al. 2007] among others, reviewed 
by Carvajal-Rodriguez 2010). However, GEMA is designed 
specifically to answer important questions that have not 
been addressed with previous programs. Specifically, here 
we present a core program "GEMA_r1.pl" that models the 
influx of -50 novel point mutations per individual (the real rate 
observed in the human genome) in order to determine the 
genetic parameters most crucial for maintaining population 
fitness. We also introduce the advanced version 
"GEMA_r01 Java" that can precisely mimic real biological pro- 
cesses influencing genome evolution. It is designed to perform 
computational experiments to understand nonrandomness in 
genomic nucleotide compositions such as GC-isochores 
(Bernardi 2007), codon usage bias (Plotkin and Kudla 201 1), 
and midrange inhomogeneity regions (Bechtel et al. 2008; 
Prakash et al. 2009). 

Materials and Methods 

The simplest scheme of GEMA is demonstrated in figure 1 and 
its major steps are outlined below. 

Genomes and Individuals 

A large portion of a real genomic sequence (even whole chro- 
mosomes of human or other species) can be assigned as a 
reference genome for a model population. A user specifies the 
number of individuals in the population (A/). Each individual is 
constructed as a diploid genome that descended as two hap- 
loid gamete genomes from its parents. 

Mutations 

Taking a user-defined parameter [i (number of novel muta- 
tions per gamete), GEMA creates mutations in the genomic 
sequences using random number generator for choosing mu- 
tation positions. The relative frequencies of different types of 
mutations (e.g., T -> C or G -> C) can be defined in an input 
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Fig. 1. — GEMA begins with a genetically identical population of size 
N. Genomic mutations occur in each individual, which are passed onto 
offspring. According to the mutations inherited, fitness is calculated for 
each offspring. The N fittest offspring become the next generation and the 
process repeats for thousands of generations. Additional details on GEMA 
are provided in the Materials and Methods section, supplementary file S1 , 
Supplementary Material online (GEMA user guide), and our GEMA web 
page. 



table that approximates the observed frequencies in nature 
and can also take into account the local nucleotide context 
(option available for GEMA_r01 Java). Upon generation of a 
mutation, GEMA assigns a selection coefficient (s parameter) 
to the mutation using a user-defined s distribution. Note thats 
values are not normalized (see also GEMA user guide in sup- 
plementary file S1, Supplementary Material online). In the ad- 
vanced version of the program (GEMA_r01 Java), different 
genomics elements (exons, introns, noncoding-RNA, con- 
served elements, etc.) may have their own specific s 
distributions. 

Meiotic Recombination and Gametogenesis 

Haploid genomes of gametes are generated for each virtual 
individual from its parents' chromosomes. The number of mei- 
otic recombinations between parents' chromosomes is an 
input parameter (r). The recombination sites are defined by 
a random number generator, which can take into account the 
"hot-spots" and "cold-spots" for recombination events from 
the International HapMap Consortium genetic maps (option 
available for GEMA_r01 Java). 

Computation of a New Generation of Virtual Individuals 

Different mating schemes for virtual individuals are possible as 
input options. By default, we use random permanent pairings 
between male and female virtual individuals. Their offspring 
have diploid genomes formed by two randomly chosen 



parents' gametes. The number of offspring per individual (a) 
is a user-defined input parameter. 

Selection 

The overall fitness of every created virtual individual in the 
GEMA algorithm is computed by taking into account all the 
mutations possessed by this individual. The current version of 
GEMA does not take into account mutual effects of muta- 
tions, such as compensatory mutations and epistasis. GEMA 
calculates fitness for each gene by summing all the s values of 
mutations within that gene. For example, assume that for a 
human gene, its maternal allele is composed of a particular 
haplotype containing x number of SNPs and its paternal allele 
is composed of a different haplotype comprising y number of 
SNPs. The fitness of the maternal allele for the given gene (w m ) 
will be a sum of s values for all the x SNPs within this gene, 
whereas the fitness of the paternal allele (w p ) will be a sum of s 
values for all they SNPs. The fitness of the gene in this example 
is calculated from the w m and w p values and also another 
input parameter, the dominance coefficient (h). In a codom- 
inance mode (h = 0.5), the gene fitness is the average of the 
fitness of maternal and paternal alleles. Under a recessive 
mode (h=1), which corresponds to recessive genes, the 
fitness is the maximum between w m and w p values (hetero- 
zygotes with one deleterious allele are healthy), whereas for a 
dominant mode (h = 0), which corresponds to dominant 
genes, the gene fitness is the minimum between w m and 
w p values. For a general case, the gene fitness is calculated 
by the formula: w=m\n(w mi w p ) + h*abs(w m -w p ). Finally, 
the overall fitness of the virtual individual is the sum of fit- 
nesses of all genes inside the genome. In the selection phase 
of GEMA algorithm, the program picks the N fittest offspring 
and forms from them the new generation. This new genera- 
tion replaces the previous one and the entire cycle repeats for 
a user-defined number of generations. 

GEMA regularly outputs the following major parameters: 
Current generation, total fitness of the population, number of 
SNPs, total number of fixed mutations (F s ), and total number 
of mutations (Q) with selection coefficient s. In addition, all 
genotypes of each individual are stored in the backup files A 
and B and can be easily retrieved (see supplementary file S1, 
Supplementary Material online). 

A detailed description of GEMA algorithm is presented in 
the "GEMA_lnstructions.pdf" available from our web page: 
http/ybpg.utoledo.edu/~afedorov/lab/GEMA.html (last accessed April 
17, 2014) whereas a copy of it is presented in the supplementary 
file S1, Supplementary Material online. 

The programming codes for GEMA_r01 Java GEMA_r1.pl 
and pseudocodes are freely available from our lab web site 
http/i3pg.utoledo.edu/~afedorovylabGEMA.html (last accessed April 17, 
2014). Our lab web pages also have extensive explanations via video 
demonstrations. A discussion board has also been arranged for a 
broader public community to share experiences and concerns. 
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Results 

Several examples of GEMA computations are shown in 
figure 2. These graphs illustrate the modeled dynamics for 
the influx of mutations, 12% of which have positive selection 
coefficient (s>0) whereas the rest 88% have a negative 
effect (s<0). The distribution of mutations by s parameter 
has been modeled according to a decay curve, shown in the 
figure 3A When the number of meiotic recombination events 
was low (r=1, recombinations per gamete) and the rate 
of mutations was approximated to the one naturally ob- 
served for humans (|i = 20, mutations per gamete), the rel- 
ative fitness of individuals declined with generations. Yet, a 
higher degree of purifying selection pressure (corresponding 
to a larger number of offspring per individual — a-parameter) 
caused the decline of fitness to be less sudden with respect 
to increasing number of generations (see fig. 2A). 
These parameters are thoroughly explained in the User 
Guide for GEMA (in supplementary file S1, Supplementary 
Material online, p. 6-9) and also in the GEMA web site 
(http://bpg.utoledo.edu/~afedorovy1ab/GEMA.html, last accessed 



April 17, 2014) including GEMA_video_presentation.m4v, 
GEMAjava pseudocode, and other supplementary materials, 
Supplementary Material online. 

Figure 2B illustrates the model with two fixed parameters: 
[i = 20 and a = 5 (offspring per individual). The only variable 
parameter in this experiment was the number of meiotic re- 
combination events per gamete (r). The increase of r to 48 
prevented the declining of fitness. We specifically used r= 48, 
because it represents the average number of pieces of pater- 
nal and maternal genomes in a human gamete (on an aver- 
age, 35.2 pieces result from meiotic recombinations in 
autosomes and 11.5 pieces result from the existence of 23 
pairs of chromosomes). 

The variations of total number of single nucleotide poly- 
morphisms (SNPs) in generations are shown in the figure 2C 
and D. The latter picture exemplifies some peculiarities in the 
SNP dynamics under certain conditions. The gigantic peaks in 
the number of SNPs in the population were observed when 
the meiotic recombination rate was low (r< 1) and genes had 
a recessive mode (gene fitness of heterozygote is close to the 



A r=1; h =0.5; /<=20 B a=5; h=0.5; //=20 




Generations (x1 0.000) Generations (x 100,000) 



Fig. 2. — Exemplification of results from GEMA_r1 .pi and GEMA_r01 .java, illustrating evolutionary computations for 50 virtual individuals, each of whose 
genome is represented by human chromosome 22. (A) and (B) The change of relative fitness of individuals in population with respect to time (generations). In 
this modeling, we defined the distribution of mutations as a decay curve of selection coefficient (s), where 88% of mutations have negative s values and only 
12% have positives values (see fig. 3A). We do not normalize selection coefficient values, so the illustrated fitness of individuals is presented in relative units. 
Negative values of relative fitness show a decline in organism adaptability, whereas positive values indicate improvement. In these computational exper- 
iments, genes were assigned codominance mode (h = 0.5). In (A), how different numbers of offspring per individual (a = 3, 5, 8, or 10 offspring) influence 
the relative fitness, under the same recombination rate (r= 1) is demonstratd. In (B), how different numbers of recombination events per gamete (r= 1,10, 
20, or 48) affect the relative fitness whereas the number of offspring remained constant {a = 5) is demonstrated. (0 and (D) illustrate the dynamics of 
number of SNPs in the population. (0 Variations in the number of SNPs with respect to generations for four different values of novel mutations per gamete 
(fi = 2, 8, 20, or 30). In (D), smoothed number of SNPs (by taking averages for extended number of generations) in addition to emphasizing that under 
specific conditions (e.g., recessive genes in which the dominance mode h is close to 1) there may be considerable and long-lasting spikes in the number of 
SNPs when recombination rate is low (r< 1) is demonstrated. 
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maximal fitness of maternal and paternal alleles; dominance 
coefficient h = 1). This effect is also discussed below. 

We computed the probability of fixation of a novel muta- 
tion with the selection coefficients, which we denote as tt s . To 
make these results immediately understandable, we simplified 
the distribution of mutations by their selection coefficients to 
trivial cases, where a mutation has only three options for a 
possible s value: - 1 , 0, or + 1 . Two of such distributions, used 
in our modeling and named as experiments B and C, are 
shown in the figure 3B and C. In both the experiments B 
and C, mutations with s= -1 are nine times more abundant 
than those with s= +1 . However, in the experiment B, a ma- 
jority of mutations (90%) are neutral (s = 0) whereas in exper- 
iment C, neutral mutations represent a minor fraction (10%). 

By taking advantage that we can trace the fate of each 
mutation in the simulation experiments, we computed the 
probability of fixation of a novel mutation with the selection 
coefficient s, which we denote as n s . The probability of fixa- 
tion was calculated as 

7Ts=Fs/C s , (1) 

where Q is the number of novel mutations with selection 
coefficient s that occurred from generation 2,000 to 10,000 
in all offspring, whereas F s is the number of fixed mutations 
with selection coefficient s within the same period of 8,000 
generations. (After the first 2,000 generations, the population 
reaches equilibrium in the number of SNPs and subsequent 
consideration of the next 8,000 generations allows us to ac- 
quire sufficient statistics for fixed mutations.) Figures 4-6 
show values of n s for six different parameters: 1) N— size of 
the population (24, 50, or 100 individuals), 2) /x — number of 
novel mutations per gamete (1 , 5, 1 0, or 20), 3) r — number of 



ABC 




2 



Fig. 3. — Distributions of mutations by user-assumed selection coeffi- 
cients (s-values), which were used for modeling analysis. (A) A continuous 
distribution of mutations by s that can range from -20 to +20 depending 
on their deleterious (negative s values) or beneficial (positive s values) ef- 
fects. This curve represents 88% deleterious and 12% beneficial muta- 
tions. (B) A discrete distribution of mutations characterized predominantly 
by neutral mutations occurring at a frequency of 90% within the popu- 
lation, whereas the remaining 10% is characterized by deleterious and 
beneficial mutations occurring in a ratio of 9:1 . (O illustrates another dis- 
crete distribution for mutations, where the ratio of deleterious to beneficial 
mutations occurs again in the ratio of 9:1 . However, this model is charac- 
terized by a preponderance of mutations with deleterious effects (81 %). 
Neutral mutations in this case comprise 10% and beneficial — 9% of over- 
all nucleotide changes occurring within the population. 



meiotic recombinations events per gamete (1 or 48), 4) h — 
dominance coefficient (0, 0.5, or 1), 5) a — number of off- 
spring per individual (2, 5, or 10), and 6) D — distribution of 
novel mutations by selection coefficients (experiments B or C 
shown in fig. 3B and C, respectively). The original tables 
with these complete data sets are provided in the supplemen- 
tary tables S1 and S2, Supplementary Material online. These 
figures 4 and 5 and supplementary tables S1 and S2, 
Supplementary Material online, demonstrate intricacies in var- 
iations of 7T S as a function of six arguments: 7t s = tt s {N, /x, r, h, 
a, D) We detail below some of the major consequences of 
these dependencies. 

In our computer experiments, the level of selection pres- 
sure is measured as the number of offspring per individual 
(a). The GEMA settings in all the described experiments 
were always based on "survival of only the fittest" and a 
constant size of population (A/ is fixed for a particular com- 
putational experiment). Thus, when a = 2, the selection is 
completely turned off even for beneficial and deleterious 
mutations with s^0 (because no offspring are removed). 
The setting with a = 2 serves as a good control for the 
computational algorithm because in every experiment with 
a = 2, we observed that tt s {a = 2) was very close to 1/2/V 
for every value of s in accordance with Kimura's (1983) law 
for neutral mutations. Importantly, Kimura did not consider 
variations in the number of offspring per individual. His 
probability of ultimate fixation 7rs kimura js calculated based 
on the number of novel mutations in adults (a subset of 
offspring that reach adulthood and subsequently create next 
generation of offspring). For nearly neutral mutations, 
7T s kimura can be calculated from our n value from formula 
(1) by simple normalization: 7r s kimura = 7r s xoc/2. (Observe that 
this normalization formula may not be correct for beneficial 
mutations, where fixation probability might turn out to be 
>1 post the normalization.) In a majority of GEMA compu- 
tations when selection is turned on (number of offspring is 
>2), the 7r s=0 kimura , denoting the probability of fixation of 
neutral mutations (s=0), follows Kimura's law and is very 
close to 1/2/V. In other words, the product of three of our 
parameters tt s=0i a, and N approximates to 1 (7r s=0 xaxA/ =■ 
1). However, for a specific set of parameters, 7r s=0 kimura 
significantly deviates from 1/2/V. For example, for (a =10, 
h = 0, r=1, D = expC, u= 1, N =50) the product 
of 7t s=0 xaxN equals 2.13 instead of being equal to 1. 
For another set of conditions (a =10, h=1, r=1, 
D = expC, |i=1, A/=50), the product of 7z s=0 xaxN 
equals 0.85 (for details, see supplementary tables S1 and 
S2, Supplementary Material online). The anomalies from 
Kimura's law resulted from numerous mutations in individ- 
uals being linked together as multiple haplotypes from var- 
ious genomic loci (because r is low). Neutral mutations are 
linked with nonneutral ones and all mutations within a hap- 
lotype are selected as a whole unit. The length of haplo- 
types is in the reverse proportion to the recombination rate 
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Fig. 4. — Dependence of the probability of fixation n s of mutations with beneficial effects. The effects of mutations have been illustrated in our model 
according to selection coefficients exemplified by values of + 1, 0, and -1 for beneficial, neutral, and deleterious mutations, respectively. Individual 3D plots 
demonstrate the quantitative behavior of fixation of mutations as interplay of different parameters represented by population size (A/), recombination rate 
(r), variations in influx of novel mutations (/x), mode of dominance (h), number of off springs (a), and predominance of either neutral mutations (according to 
fig. 3B) or deleterious mutations (according to fig. 30- Exact values of all parameters are provided in supplementary tables S1 and S2, Supplementary 
Material online. 



(r). The data from figure 6 demonstrate that the highest 
deviations of 7r s=0 from neutrality law were observed for 
the lowest recombination rate when r= 1 . 

The size of a population considerably influences the fixation 
probabilities tt s in such a way that the average fitness of the 
population always improves via increasing its size (A/). 
Supplementary tables S1 and S2, Supplementary Material 
online, demonstrate how a growth of N changes tt s values 
for deleterious and beneficial mutations. GEMA simulations 
are in concordance with the well-known observations that 
deleterious and neutral mutations have a higher chance to 
be fixed in small populations due to random drift (Small 
et al. 2007). We also observed that the rate of fixation for 
beneficial mutations depends on N. Yet, the change of 7r s>0 
with respect to N was much lower than that observed for 
deleterious and neutral mutations. 



After Haldane publication in 1927, it is generally accepted 
that the probability of fixation of beneficial mutations (7r s>0 ) in 
large populations should be twice greater than s (n ^ 2 s) 
(Haldane 1927; Patwa and Wahl 2008; Charlesworth and 
Charlesworth 2010; Chelo et al. 2013). This formula was 
mathematically derived through consideration of branching 
Galton-Watson process of chance extinction of a new muta- 
tion in a stationary population where individuals have Poisson 
distributed number of offspring with variance equal to 1. 
However, our GEMA results demonstrate that the probability 
of fixation of a beneficial mutation n also notably depends on 
the combination of the six aforementioned parameters (A/, [i, 
r f h, ot, D). This phenomenon can be explained by the linkage 
of deleterious, beneficial, and neutral mutations within hap- 
lotypes and selecting them as whole units. The most dramatic 
example of such linkage is presented in the figure 2D. It shows 
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Fig. 5. — Dependence of the probability of fixation 7i s of mutations with deleterious effects (s 



-1). All parameters are the same as in figure 4. 



a computational experiment for recessive genes (h = 1) with 
low level of recombination (r<1). In this model, beneficial 
mutations happen to occur spontaneously in a small fraction 
of genes. Let us consider one of such genes, denoted by A. 
We further assume that A acquired beneficial mutations by 
chance that are on their way for a rapid fixation. At the same 
time, neighboring genes (let us call them B and 0 are likely to 
gradually accumulate deleterious mutations (which are more 
abundant than beneficial ones in our experiments). The mu- 
tations in all neighboring genes A, B, and Care linked together 
within a single haplotype because recombination rate is set to 
be low. Under a recessive mode of dominance, the effect of 
deleterious mutations in B and C is negligible until their 
frequency is low. These linked beneficial and deleterious 
mutations are long trapped as clustered SNPs that can neither 
be easily fixed nor drifted away. The increase of this haplotype 
frequency causes a prevalence of negative effects on fitness 
from genes B and C, averting the fixation of all mutations 
within this haplotype. On the other hand, a decrease of fre- 
quency of this haplotype causes a significant decline in the 
negative effects from genes B and C. So in this case, the 



positive effects of beneficial mutations in gene/\ start prevail- 
ing and thereby forestall the complete loss of this particular 
haplotype. Thus, such specific combinations of parameters 
(r< 1, h = 1) can cause a dramatic instability of the number 
of SNPs as observed in GEMA computations. Peculiarities of 
such unstable SNP dynamics can be observed in either the 
gigantic peaks of SNPs numbers (fig. 2D) or the gradual accu- 
mulation of SNPs with severe fluctuations (the latter occurs 
when r is significantly lower than 1 recombination per 
gamete). 

Finally, using GEMA modeling, we investigated the K/\i 
ratio of the number of fixed mutations per generation (K) to 
the number of novel mutations per gamete (/x). Kimura (1 983) 
demonstrated that under neutral selection conditions the K/[i 
ratio is equal to 1 . In 2008, Chen et al. advanced the mathe- 
matical apparatus for the neutral theory generalizing it for 
incomplete dominance (0</?<1), overdominance (h<0), 
and underdominance (h>1) modes and characterized the 
effects of dominance on the probability of fixation of a 
mutant allele. However, mathematical models do not consider 
the following problems: 1)The linkage between nearly neutral 
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Fig. 6. — Dependence of the probability of fixation tt s of mutations with neutral effects (s = 0). All parameters are the same as in figure 4. Note that for 
comparison of these n values with Kimura's law, they should be normalized by taking into account the number of offspring per individual as described in the 
Results section (7r s kimura = tt s x a/2). 



mutations and beneficial/deleterious ones through formation 
of haplotypes that may be not neutral and 2) the selection that 
is carried out simultaneously on the entire pool of genes. 
GEMA computations have revealed that even under the 
influx of predominantly neutral mutations (90%, experiment 
B), a significant deviation of the K/\i ratio from 1 may be ob- 
served. Figures 7 and 8 demonstrate that the K/\i ratio de- 
pended on all of the considered parameters (A/, /x, r f In, a, D). 
These results show that the K/\i ratio varied from 2.5 to 0.78, 
under realistic conditions for human population. In experiment 
C, with less neutral mutations, the deviations of K/\i ratio from 
1 are significantly higher. 

Discussion 

The ultimate goal of our GEMA project is to make a compu- 
tational model for the evolution of human genome at as close 
to natural conditions as possible. A major challenge for such 
simulations is the gigantic size of the genome. Processing this 
entity of more than 3 billion nucleotides is possible only on 



advanced supercomputers running for many days. Hence, at 
this initial stage of GEMA project we take a portion of the 
human DNA sequence (which may be a considerable section 
or even a whole chromosome) and assume it to be the entire 
genome of our virtual individuals. Other computation simula- 
tions have also conceived a large chromosomal segment 
modeling an entire genome (Chadeau-Hyam et al. 2008; 
Kiezun et al. 2013). During these previous simulations, the 
authors considered the same number of mutations and mei- 
otic recombinations in the modeling genome as their particu- 
lar chromosomal segment has in reality. Such an approach 
ignores the existence of vast majority of other mutations 
that constantly occur in other chromosomes and which may 
interfere with the modeling chromosomal segment. In this 
respect, our GEMA approximations are completely opposite. 
Inside the modeling chromosomal segment, which we con- 
sider as the genome of virtual individuals, we introduce the 
entire influx of mutations and meiotic recombination events 
that are observed for the whole human genome. Our ap- 
proach ignores the existence of a majority of genes. 
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Fig. 7. — Graphical illustrations of deviations of K/{i ratio from 1 with respect to change of number of novel mutations per gamete for particular sets of 
parameters (N, r, h, a, D). K stands for the number of fixed nucleotides in each generation, whereas n is the number of novel mutations per gamete. The 
graphs are obtained on the basis of predominant pool of neutral mutations, modeled by experiment B for s distribution (see fig. 3B). Within each graph, 
variations in the ratio of /C/ji have been calculated for varying number of offspring (a) within the population (green a = 2; red a = 5; blue a = 1 0). In toto, the 
interplay of various parameters such as recombination rate (r), dominance coefficient (h), population size (A/), novel mutations per gamete (ji), number of 
offspring (a), and overall effect of mutation pool (deleterious, beneficial, or neutral) has been represented as causal factors for deviations from previously 
assumed unitary ratio of K/\i. 



However, in numerous computational experiments we dem- 
onstrated that the exact number of genes (like 600 vs. 6,000 
genes) or gene length (like 1,000 nts versus 1 0,000 nts) does 
not influence the main results in our focus such as the fitness 
of individuals and the number of SNPs in the population 
during evolution. This observation inclines us to think about 
the fruitfulness of our approach for the assessment of the 



recombination and mutation rates on the fitness and mutation 
dynamics. In GEMA simulations, the selection and evolution 
are implemented simultaneously on gigantic ensembles of 
mutations that are regrouped in every individual due to mul- 
tiple meiotic recombination events. Such modeling may reveal 
unknown features in dynamics of mutations, which we plan 
to present in the next publications. 
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Fig. 8. — Graphical illustrations of deviations of /C/fi ratio from 1 with respect to change of number of novel mutations per gamete (p) for particular sets of 
parameters (A/, r t h, a, D). The graphs are obtained on the basis of a prevalence of deleterious mutations, quantified by experiment C (see fig. 30- All 
parameters are the same as in figure 7. 



In this article, we primarily focused on the impact of meiotic 
recombinations on the population fitness. Our computer sim- 
ulations demonstrated that an increase in the number of re- 
combination events per gamete considerably improves the 
fitness of the population via increasing the probability of fix- 
ation of beneficial mutations and simultaneously decreasing 
the probability of fixation of deleterious mutations. This be- 
havior is in accordance with the fundamentals of classical pop- 
ulation genetics that acknowledge "the evolutionary 
advantage of recombination" (Felsenstein 1974) and, in par- 
ticular, the Hill-Robertson effect. However, the Hill-Robertson 



effect is rather a qualitative estimation showing recombination 
driven enhancement of a population's ability of fixation of 
favorable mutations. Textbooks on this topic do not provide 
quantitative estimations on how a specific change in recom- 
bination frequency impacts the probability of fixation of favor- 
able mutations (Hartl and Clark 2006; Durrett 2008; 
Charlesworth and Charlesworth 2010). The advantage of 
GEMA simulations lies in its ability to precisely measure the 
effect of a particular recombination rate (r-parameter) on the 
population fitness and probability of fixation of mutations 
with different selection coefficients. For example, let us 
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consider the results from the supplementary table S2, 
Supplementary Material online, for a chosen set of parameters 
(N= 1 00, a = 5, h = 0.5, [i = 5, and the distribution of selec- 
tion coefficients as in fig. 30- When the recombination rate 
was set to r= 1, the probability of fixation of neutral muta- 
tions was 7T( S=0 ) = 0.0022, beneficial ones was 
7T( S=+1) = 0.0082, and deleterious 7T(s=-i) = 0.00048. The in- 
crease of the recombination rate up to r=48 elevated the 
probability of fixation of beneficial mutations 2.7 times to 
7T( S=+1 ) = 0.022 and simultaneously reduced the probability 
of fixation of deleterious mutations 40 times to the 
7T( S= _ 1 ) = 0.00001 2, whereas the probability of fixation of 
neutral mutations was marginally changed [tt( S=0 ) = 0.00205]. 
Moreover, GEMA simulations demonstrated that the elevation 
of the influx of mutations also had a dramatic effect on the 
probability of fixation. For instance, doubling mutation rate to 
Ijl = 10 while keeping the same parameters as described ear- 
lier (N = 100, a = 5, h = 0.5, and r=48) caused the decrease 
in probability of fixation of beneficial mutations 1 .7 times to 
7r (s=+1) = 0.013 and simultaneously increased the probability 
of fixation of deleterious mutations 6.2 times to 
7T( S= _ 1 ) = 0.000074. When we quadrupled the mutation 
rate to /x = 20, the 7T( S=+1 ) became equal 0.0078 whereas 
7T( S __i) equaled to 0.00027. This example illuminates the abil- 
ity of GEMA simulations to evaluate the total effect of thou- 
sands of deleterious, beneficial, and neutral mutations under 
different conditions (gene dominance modes, recombination 
rates, population size, mating schemes, selection pressure, 
and various distribution of mutations by selection coefficients). 

Intricate dynamics of mutations in genomes depends on 
numerous parameters of a different nature including those 
that determine the following biological processes: 1) Level of 
selection pressure (number of offspring per individual and 
nonrandomness in formation of next generation from these 
offspring), 2) genetic drift (mainly determined by the popula- 
tion size), 3) population structure (e.g., mating schemes, pop- 
ulation subdivision, migrations, inbreeding), 4) genome 
structure and functioning (number and arrangement of 
genes, number of meiotic recombinations per gamete, distri- 
bution of dominance coefficients among genes, etc.), and 5) 
mutation characteristics (number of novel mutations per 
gamete, distribution of these mutations by their selection co- 
efficients, arrangement of mutations along genome, possible 
"mechanistic" fixation bias, etc.). In this introduction paper on 
GEMA, we considered only six parameters (A/, /x, r f h, a, D) 
and demonstrated that their specific combinations intricately 
and dramatically affect the fixation probability and fitness. Our 
multiple experiments with GEMA have confirmed that the 
probability of ultimate mutation fixation tt Si fitness of individ- 
uals, and the number of SNPs in the modeling population 
practically do not depend on the length of genes (L) and the 
number of genes (A/ genes ) in the genomes when A/ genes >> [i 
and A/ genes >> r. To increase the speed of computations, our 
presented data were obtained for A/ genes = 600 and L = 1 ,000 



nucleotides settings. Yet, these results should be the same as 
for the entire human genome (A/ genes « 25,000 and 
L « 35,000 bp). In other words, the total number of muta- 
tions and recombinations per individual and not the density of 
those mutations and recombinations per genomic length are 
important for dynamics of numerous mutations in population. 

We performed our computations using the core version of 
GEMA (GEMAji .pi). In these simulations, we did not use real 
mutation distributions in respect to the local nucleotide con- 
text or real gene sequences because they do not influence the 
main focus of this article, which is toward finding important 
parameters that preserve population fitness under intense 
influx of mutations. For other queries that require mimicking 
biological reality with much closer proximity, the extended 
version of GEMA (GEMA_r01 Java) should be used. It has 
many advanced features described on the web (http://bpg. 
utoledo.edu/~afedorov/lab/GEMA.html, last accessed April 
17, 2014). For example, the input of our program is real chro- 
mosomal DNA of mammals, on which positions of genes and 
functional elements are tabulated in input matrices. Then, 
mutations that are modeled by the program have the same 
frequencies and distributions as those observed in nature and 
computed from the SNP databases. Positions and frequency of 
modeled meiotic recombinations are also taken from the 
public databases describing these events (HapMap, NCBI 
[Frazer et al. 2007]). G EM A_r01 Java has several advanced 
features already build in including the availability of multiple 
environment option where each mutation is assigned a selec- 
tion coefficient vector 5 with coordinates representing scalar s 
values specific for each environment. We provide extensive 
training web pages regarding the usage of GEMA programs 
and have a strong commitment to help the scientific commu- 
nity in maximizing their preferred workflows. However, the 
usage of GEMA_r01 Java is computationally consuming and 
often requires supercomputer power, which we are unable to 
provide. GEMA_r01 Java can be applied to the investigation of 
many specific questions related to the fields of genomics 
and population genetics. Our lab is focused to use GEMA 
for verification of alternative ideas about the evolution of spe- 
cific genomic regions (isochores, third codon positions, etc.) 
and for investigation of genomic pattern formation and 
evolution. 

Supplementary Material 

Supplementary tables S1 and S2 and file S1 are available at 
Genome Biology and Evolution online (http://www.gbe. 
oxfordjournals.org/). 
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